DalTeam@INLI-FIRE-2017: Native Language Identification using SVM with SGD Training

نویسندگان

  • Dijana Kosmajac
  • Vlado Keselj
چکیده

Native Language Identification (NLI), as a variant of Language Identification task, focuses on determining an author’s native language, based on a writing sample in their non-native language. In recent years, the challenging nature of NLI has drawn much attention from the research community. Its application and importance are relevant in many fields, such as personalization of a new language learning environment, personalized grammar correction, and authorship attribution in forensic linguistics. We participated in the INLI Shared Task 2017 held in conjunction with FIRE 2017 conference. To implement a machine learning method for Native Language Identification, we used Character and Word N-grams with SVM (Support Vector Machines) classifier trained with SGD (Stochastic Gradient Descent) method. We achieved F1 measure of 89.60% (using 10-fold cross validation), using provided social media dataset and 48.80% was reported in the final testing done by INLI workshop organisers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mangalore-University@INLI-FIRE-2017: Indian Native Language Identification using Support Vector Machines and Ensemble approach

This paper describes the systems submitted by our team for Indian Native Language Identification (INLI) task held in conjunction with FIRE 2017. Native Language Identification (NLI) is an important task that has different applications in different areas such as social-media analysis, authorship identification, second language acquisition and forensic investigation. We submitted two systems usin...

متن کامل

BMSCE_ISE@INLI-FIRE-2017: A simple n-gram based approach for Native Language Identification

Native Language Identification (NLI) aims to identify native language L1 of an author by analysing the text written by him/her in other language L2. NLI is often implemented as a supervised classification problem. In this paper, we report a NLI system implemented using character tri-grams, word uni-grams and bigrams methods using linear classifier, Support Vector Machines (SVM). The work demons...

متن کامل

Bharathi SSN @ INLI-FIRE-2017: SVM based approach for Indian Native Language Identification

Native Language Identification (NLI) is the task of identifying the native language of a writer or a speaker by analyzing their text. NLI can be important for a number of applications. In forensic linguistics, native language is often used as an important feature for authorship profiling and identification. Nowadays due to the huge usage of social media sites and online interactions, receiving ...

متن کامل

Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification

This overview paper describes the first shared task on Indian Native Language Identification (INLI) that was organized at FIRE 2017. Given a corpus with comments in English from various Facebook newspapers pages, the objective of the task is to identify the native language among the following six Indian languages: Bengali, Hindi, Kannada, Malayalam, Tamil, and Telugu. Altogether, 26 approaches ...

متن کامل

SSN_NLP@INLI-FIRE-2017: A Neural Network Approach to Indian Native Language Identification

Native Language Identification (NLI) is the process of identifying the native language of non-native speakers based on their speech or writing. It has several applications namely authorship profiling and identification, forensic analysis, second language identification, and educational applications. English is one of the prominent language used by most of the non-English people in the world. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017